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Abstract. In this paper, we give quantitative bounds on the /-total variation 
distance from convergence of an Harris recurrent Markov chain on an arbitrary 
under drift and minorisation conditions implying ergodicity at a sub-geometric 
rate. These bounds are then specialized to the stochastically monotone case, 
covering the case where there is no minimal reachable element. The results are 
illustrated on two examples from queueing theory and Markov Chain Monte 
Carlo. 

AMS 2000 MSC 60J10 

Stochastic monotonicity; rates of convergence; Markov chains 

1. Introduction 

Let P be a Markov transition kernel on a state space X equipped with a count- 
ably generated a-field X. For a control function / : X ^ [1,cxd), the f -total 
variation or f-norm of a signed measure yU on A" is defined as 

\\fi\\f := sup \fi{g)\ . 

\9\<f 

When / = 1, the /-norm is the total variation norm, which is denoted ||/i||Tv. We 
assume that P is aperiodic positive Harris recurrent with stationary distribution 
TT. Our goal is to obtain quantitative bounds on convergence rates, i.e. rate of 
the form 

r(n)||P"(x, ■) -7r||/ < ^(x) , forallxeX (1.1) 

where / is a control function / : X ^ [I7O0), {r{n)}n>o is a non-decreasing 
sequence, and g : X - -> [0, cxj] is a function which can be computed explic- 



itly. As emphasized in (jRoberts and Rosenthall . l2004l section 3.5), quantitative 
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bounds have a substantial history in Markov chain theory. Apphcations are nu- 
merous including convergence analysis of Markov Chain Monte Carlo (MCMC) 
methods, transient analysis of queueing systems or storage models, etc. With 
few exception however, these quantitative bounds were derived under conditions 
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In this paper, we study conditions under which (11.11) hold for sequences i n 
the set A of subgeometric rate functions from iNummelin and TuominenI ()l983r ). 
defined as the family of sequences {r{n)}n>o such that r{n) is non decreasing and 
log r(n)/r;, J, as n ^ oo. Without loss of generality, we assume that r(0) = 1 
whenever r G A. These rates of convergence have been only scarcely considered 
in the literature. Let us briefly summarize the results available for convergence at 
subgeometric rate for general state-space chain. To ou r best knowledge, the first 
result for subgeometric sequence has been obtained bv lNummelin and Tuomim 
(|l983|), who derive sufficient conditions for ||^-P" — 7r||xv to be of order o(r~^(n)). 
The basic condition involved in this work is the ergodicity of order r (or r- 
ergodicity), defined as 



sup Ej, 



TB-l 
. k=0 



r{k) 



< oo 



:i.2) 



def 



where Tb inf{n > 1, G B} (with the convention that inf = oo) is the re- 
turn time to some accessible some small set B (i.e . tt{B) > 0). These results were 
later extended by iTuominen and Tweedid ()l994^ to /-norm for general control 
functions / : X — > [1, oo) under (/, r)-ergodicity, which states that 



sup Ej. 



TB-l 
. k=0 



< OO 



;i.3) 



for some accessible small set B. These contributions do not provide computable 
expressions for the bounds in (jl.ip . 

A direc t route to quantitative b ounds for subgeometric sequences has been 
opene d by 



Veretennikov 



()l993[ l and iRosenthal 



(Il997lll999l l. based on coupling techniques (see lGulinskv and Veretennikov 



19951) for the coupling construction of Harris recurrent 
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Markov chains). This method consists in relating the bounds 



ment of the c ouylinq tim e through Lindvall's inequahtv iLindvalll ()l979l Il992l ). 



J m 



to a mo- 



Veretennikovl ()l997L Il999( ) focus on a particular class of Markov chains, the so- 



called functional autoregressive processes, defined as Xn+i = g{Xn)+Wn+i, where 
(7 : M'' ^ M'' is a Borel function and {Wn)n>o is an i.i.d. sequence, and provides ex- 
pressions of the bounds in (jl.lj) with the total variation distance (/ = 1) and poly- 
nomial rate functions r(n) = n^, n > 1. These results have later been extended, 
using similar techniques, to truly subgeome tric sequence, i.e. W(n)}„>n € A sat- 
isfying lim^^oo ''"('^)'^"^ = 00 for any k, in iKlokov and Veretennikovl (j2ilOJ), for 
a more general class o f funct ional autoregressive process. 

Fort and MoulinesI (j2003bf ) derived quantitative bounds of the form for 
possibly unbounded control functions and polynomial rate functions, also using 
the coupling method. The bound for the modulated moment of the coupling time 



Fort and Moulines 



i s obt ained from a parti cular drift condition introduced by 
( 20001 ) later extended bv iJarner and RobertsI (j200lh . This method is based on 
a recursive computation o f the polynomial moment of the coupling time (see 
( Fort and Moulined . l2003aL proposition 7)) which is related to the moments of 



the hitting time of a bivariate chain to a set where coupling might occur. This 
proof is tailored to the polyn omial case and cannot be easily adapted to the 
general subgeometric case fsee iFord (j200lh for comments). 

The objective of this paper is to generalize the results mentioned above in 
two directions. We consider Markov chains over general state space and we 
study general subgeometr ical rates of convergence instead of polynomial rates 



Fort and MoulinesI (j2003b|). We establish a family of convergence bound (with a 
trade-off between the rate and the norm) extending to the su bgeometrical case 
the computable bou nds obtained in the g e ometr ical c ase bv iRosenthall ( 19951 ) 



and later refined bv iRoberts and Tweedid (jl999| ) and 



Doucetal 



2004H ) (see 

( Roberts and Rosenthall . l2004l Theorem 12) and the references therein). The 
method, based on coupling ass ociated, provides a short and nea rly s elf-contained 



Tuominen and Tweedie 



proof of the results presented in lNummelin and TuominenI (|l983l ) and 
( 1994 ): this allows for intuitive understanding of these results, while also avoiding 
various analytic technicalities of the previous proofs of these theorems. 
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The paper is organized as follows. In section |21 we present our assumptions and 
state our main results. In section 12.11 we specialize our result to stochastically 
mo notone Markov chains and deri ve bounds which extends resul ts reported earlier 



by IScott and Tweedid ()l996l ) and [Roberts and Tweedid (|200^ . Examples from 



queueing theory and MCMC are discussed in section |21 to support our findings 
and illustrate the numerical computations of the bounds. 

2. Statements of the results 

The proof is based on the coupling construction (briefly recalled in section HJ. 
It is assumed that the chain admits a small set: 

(Al) There exist a set C E X, a, constant e > and a probability measure u such 

that, for all x e C, P{x, ■) > ez/(-). 
For simplicity, only one-step minorisation is consi dered in this pape r. Adapta tions 
to m- step i ninorisation can be carried out as in iRosenthall ()l995( ) (see also iFort 
(I2OO1I) and lPort and MoulinesI (|2003b|)). 

Let P be a Markov transition kernel on X x X such that, for all A ^ X, 

P{x,x',AxX) =P{x,A)l^cxcrix,x') + Q{x,A)lc^c{x,x') (2.1) 
P{x, x', XxA)= P{x\ A)l(cxc)=(x, x') + Q{x\ A)lcxc(x, x') (2.2) 

where A'^ denotes the complementary of the subset A and Q is the so-called 
residual kernel defined, for x G C and A ^ X \yy 




e)-^{P{x,A)-eu{A)) 0<e<l 



(2.3) 



Q{x,A) = 

One may for example set 

P{x,x']A X A') = 

P{x, A)P{x', A')l^cxcAx, x') + Q{x, A)Qix', A)lcycc{x, x') , (2.4) 

but, as seen below, this choice is not always the most suitable. For (x, x') G X x X, 
denote by F^^x' and K^^x' the law and the expectation of a Markov chain with 
initial distribution 6x ® Sx' and transition kernel P. 

Our second condition is a bound on the moment of the hitting time of the 
bivariate chain to C x C under the probability Px,^'- Let {r{n)} G A be a 
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subgeometric sequence and set: R{n) =^ Ylk=o^i^)- Denote by acxc '= inf{n > 
0, (X„, X'J eC X C} the first hitting time of C x C and let 



def 



U{x,x') 



def 



cCxC 



k=0 



Let y : X X X — > [0, oo) be a measurable function and set 



V{x,x') =E,,,, 



<^CxC 



k=0 



(2.5) 



(2.6) 



(A2) For any (x, x') e X x X, U (x, x') < oo and 

bu '= sup PU{x,x') = sup Ea-^a;/ 

(a;,a;')eCxC (x,x')gCxC 

(A3) For any (x, x') G X x X, V{x, x') < oo and 



'tCxC-I 



fc=0 



< oo 



by = sup PV{x, x') 

{x,x')&CxC 



sup Ej.^^./ 

(x,x')<^CxC 



k=l 



< OO 



(2.7) 



(2i 



We will establish that R is the maximal rate of convergence (that can be 
deduced from assumptions |(Al)j|(A3j| ) and that this rate is associated to con- 
vergence in total variation norm. On the other hand, we will show that the 
difference P(x, ■) — P(x', ■) remains bounded in /-norm for any function / sat- 
isfying /(x) + /(x') < y(x, x') for any (x, x') G X x X. Using an interpolation 
technique, we will derive rate of convergence 1 < s < r associated to some g- 
norm, < g < f ■ To construct such interpolation, we consider pair of positive 
functions (a,/3) satisfying, for some < p < 1, 

a{u)(3{v) < pu + {l- p)v , for all (m, w) G R+ x M+ . (2.9) 

Functions satisfying this condition can be obtained from Young's inequality. Let 
be a real valued, continuous, strictly increasing function on M"^ such that 
4>{0) = 0; then for any (a, b) > 0, 

pa pb 

ab < V (a) + J^{b) , where V{a) = / %l){x)dx and J^{b) = / il)~^{x)dx , 

Jo Jo 

where is the inverse function of ip. If we set a{u) '= V~^{pu) and (3{v) = 
J^-^((l - p)v), then the pair (/9, a) satisfies ()2.9|) . Taking ip{x) = x^ ^ for some 
p > 1 gives the special case {(ppu)^/^, (p(l — p)u/{p — 1))^^"-'^)/*'}. 
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Theorem 2.1. ^ssnme [^T^j and \(A3)\ Define 



Mu = sup I (bur{k)- — - - R{k + 1' 
km IV ^ 



where =^ max(x, 0). Then, for any (x, x') G X x X, 

||P"(x, ■) - P"(x', ■) 11/ < V{x, x') + My , (2.12) 

for any non-negative function f satisfying, for any (x, x') G XxX, /(x) + /(x') < 
l^(x, x') + My. Let {a,P) be two positive functions satisfying (j2.9p for some 
< p < 1. Then, for any (x, x') G X x X and n > 1, : 

\\P-(r ) P-(r' )\\ ^ P (Uj^^ + Mu) + (1 - p) (y(x, xQ + My) 

IIP (X, ■) P (X , -jll, < a o {P(n) + M^} ^^'^^^ 

/or any non-negative function g satisfying, for any (x, x') G X x X, g{x) + g{x') < 
po{V{x,x') + Mv}. 

The proof is postponed to section HJ 

Remark 1. Because the sequence {r{k)} is subgeometric, \imk^oor{k)/ R{k-\-l) = 
0. Therefore, the sequence {bur{k){l — e)/e — R{k)} has only finitely many non- 
negative terms, which implies that Mu < oo. 



and Mv = bv- — (2.10) 



Remark 2. When assumption |(A2)j then |(A3j| is automatically satisfied for some 
function v. Note that 



Krr rr.l 



Y^r{k) 



k=0 



^ r(acxc - ^) 

. k=Q 



On the other hand, for all (x, x') G X x X, 



= E^,^/ [r(crcxc)] l{acxC>fc}] = ^x,x' [i;r(^fc,^fc)l{acxC>fc}] 



where ) = ^x,x'[r{o'cxc)]- This relation implies that 



Tr{k) 



'^x,x' 



for all (x, x') G X X X 
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However, in particular when using drift functions, it is sometimes easier to apply 
theorem 12 . II with function a function v which does not coincide with Vr- 



To check assumptions | ( A2)] and | ( AS)] it is often useful to use a drift conditions. 
Drift con ditions implying con v ergeii ce at polynomial rates have been recently pro- 
posed in Jarner and RobertsI ( 2001 ) . These conditions have later been extended 



to general subgeometrical rates by 
functions 



DoucetaL 



(|2004al) . Define by C the set of 



C 



def 



: [l,oo) 



is concave, different iable and 

0(1) > 0, lim (f){v) = oo, lim ^'{v) = o] . (2.14) 



For G C, define the function : [1, oo) [0, oo) as H^{v) =^ j"^ Since 
is non decreasing, is a non decreasing concave differentiable function on 
[1, oo) and lim„_>oo -f^</)(t') = oo. The inverse H^^ : [0, oo) [l,oo) is also an 
increasing and differentiable function, with derivative (H^^)' = o H^^. Note 
that (log{0 o H^^})' = 0' o H^^. Since is increasing and 0' is decreasing, 
o HT^ is log-concave, which implies that the sequence 



def 



oH-\n)/<PoH-\0) 



(2.15) 



belongs to the set of subgeometric sequences A. Consider the following assump- 
tion 

(A4) There exists a function W : XxX ^ [1, oo), a function G C and a constant 
b such that PW{x,x') < W{x,x') - o W{x,x') for {x,x') ^ C x C and 
s^P{x,x')eCxcPW{x,x') < oo. 

It is shown in IDouc et all (l2004ah that under [(A4)j and are satisfied 
with the rate sequence and the control function t> = o W. In addition, it is 
possible to deduce explicit bounds for the constants Bjj, bu, By and by from the 
constants appearing in the drift condition. 



8 RANDAL DOUC*, ERIC MOULINES, AND PHILIPPE SOULIER 

Proposition 2.2. Assume \(A4]\ Then, \(A2)\ and \(A3)\ hold with v = (p o W , 
r = rs and 



U{x, x')<l + ^ {W{x, x') - 1} l^cxcy {x, x') 



(2.16) 



V{x,x')< snp(j)oW + W{x,x')l(cxcr{x,x') , (2.17) 

CxC 

+ (sup PW -l\ , (2.18) 



0(1) 



CxC 



6y < sup0oiy + supPVT . (2.19) 

CxC CxC 

The proof is in section El Proposition 12.21 is only partially satisfactory be- 
cause Assumption |(A4)| is formulated on the bivariate kernel P. It is in general 
easier to establish directly the drift condition on the kernel P and to deduce 
from this condition a dr i ft con dition for an appropriately defined kernel P (see 
(jRoberts and Rosenthall . l2004l Proposition 11) for a similar construction for ge- 
ometrically ergodic Markov chain). Consider the following assumption: 
(A5) There exists a function Wq : X x X — [l,oo), a function (po ^ C and a 

constant bo such that PWq < Wq — 4>o° Wq + foolc- 
Theorem 2.3. Suppose that \(Al)\ and \(A5)\ are satisfied. Lei (io == inf^.^c* lyo(x). 
Then, if (poido) > bo, the kernel P defined in ()2.4|) satisfies the bivariate drift 
condition \(A4 )\ with 

W{x, x') = Wo{x) + Wo{x') - 1 (2.20) 
(j) = X(j)o , for any X ,0 < X < 1 - 60/^0(^0) (2.21) 

sup PW < 2(1 - e)~N sup PWo - eu{Wo) I - 1 . (2.22) 

CxC { C ) 

where the kernel Q is defined in ()2.3p . 

The proof is postponed to the appendix. 

Remark 3. Since the function 0o is non-decreasing and limt,^oo 0o('^) = 00, one 
may always find d such that the condition 0o(l) + 0o(c^) > ^o(l ~ a)~^ + 2 is 
fulfilled. The assumptions of the theorem above are satisfied provided that the 
associated level set {Vq < d] is small. This will happen of course if all the level 
sets are 1-small, which may appear to be a rather strong requirement. More 
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realistic conditions may b e obtained b y using sma l. 



P"" of t he kernel (see e.g. iRosenthall (|l995|), IPortl (|200l|) and 



sets associated to the iterate 



Fort and Moulines 



2.1. Stochastically ordered chains. In this section, we show how to define the 
kernel P and obtain a drift condition for stochastically ordered Markov chain. 
Let X be a totally ordered set, and denote ^ the order relation. For a G X, 
denote (—00, a] = {x G X : x ^ a} and [a, +00) = {a; G X : a ^ x}. A transition 
kernel P on X is called stochastically monotone if for all a G X, P(-,(— oo,a]) 
is non increasing. Stochastic monotonicity has been seen to be crucial in the 
analysis of queuing network, Markov Monte-Carlo methods, storage models, etc. 
Stoch as tically ordered Ma r kov chains have been consi dered in 
J " 



1996) 



Lund et al. 



( 19961 ). IScott and Tweediel (|l996h and 



Lund and Tweedie 



Roberts and Tweedie 



(120001 ). In the first two p apers, it is assume d that there exists an atom at the 



bottom of the state space. 



Lund et al 



( 19961 ) cover only geometric convergence : 
subgeometric rate of conver gence are considered in IScott and Tweediel (119961 ) . 
Roberts and Tweedid (|2000l ) covers the case where the bottom of the space is a 
small set but restrict their attentions to conditions implying geometric rate of 
convergence. 

For a general stochastically monotone Markov kernel P, it is always possible 
to define the bivariate kernel P (see (12. ip ) so that the two components {X„}„>o 
and {X^}„>o are pathwise ordered, i.e. their initial order is preserved at all times. 

The construction goes as follows. For x G X, n G [0, 1] and K a transition 
kernel on X denote by G~^{x, u) the quantile function associated to the probability 
measure Kix, ■) 



Gj^{x, u) = mi{y G X, K{x, (-00, y]) > u} . 



(2.23) 



Assume that |(Al71 holds. For (x, x') G X x X and A G A"® A", define the transition 
kernel P by 



^(Cxcr(x,x')P{x,x'] A) 



1a{Gp{x, u), Gp{x', u)) du 
+ lcxc{x,x') / lA{GQ{x,u),GQ{x',u))du 



10 RANDAL DOUC*, ERIC MOULINES, AND PHILIPPE SOULIER 

where Q is the residual kernel defined in ()2.3p . It is easily seen that, by construc- 
tion, the set G X x X : x ^ x'} is absorbing for the kernel P. 

In the sequel, we assume that |(A1)] holds for some C == (— oo,a;o] {i-e.. that 
there is a small set at the bottom of the space). Let f o : X ^ [l,C)o) be a 
measurable function and define: 



f/o(x) = 



k=0 



and Vo{x) = 



k=0 



(2.24) 



Consider the following assumptions: 

(B2) For any x G X, Uo{x) < oo and sup^; QUq = fef/g < oo, 
(B3) For any x G X, Vo(x) < oo and sup^; QVq = < oo. 



Theorem 2.4. Assume that\(AT)[(B2j(B3^ holds for some set C = (-oo,Xo]. 
Then, \(A2\ and \(A3\ hold with U{x,x') = Uo{x V x'), V{x,x') = Vo{x V x'), 
v{x,x') = vo{x V x'), hu = hu^^ and by = 6vb- 

The proof is obvious and omitted for brevity. As mentioned above, drift con- 
ditions often provide an easy path to prove conditions such as |(B2)| and |(B3)| 
Consider the following assumption: 

(B4) There exists a a nonnegative function Wo : X [l,oo), a function G C 
such that for x ^ C, PWq <Wo - (t>oWo and sup^ PWq < oo. 

Using, as above 



DoucetaL 



()2004af ). it may be shown that this assumption implies 
|(B2)| and |(B3)| and allows to compute explicitly the constants. 

Theorem 2.5. Assume \(AT)\ and \(B4][ Then \(B2]\ and \(B3)\ hold with vq = 
(p o Wq, r = r^, and 



Uoix) < 1 + 



{Wo{x)-l}lcc{x) 



Vo{x) <snp(f)oWo + Wo{x)lcc{x) 
c 



buo < 1 



^0(1) 



sup PWo - eu{Wo] 
c 



bvo < sup o H^o + (1 - e)"^ \ sup PWq - eu{Wo 
c I c 

The proof is entirely similar to Proposition 12.21 and is omitted. 



(2.25) 
(2.26) 

(2.27) 

(2.28) 
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3.1. the embedded M/G/1 queue. In a M/G/1 queue, customers arrive into 
a service operation according to a Poisson process with parameter A. Customers 
bring jobs requiring a service times which are independent of each others and of 
the inter-arrival time with common distribution B concentrated on (0, oo) (we 
assume that the service time distribution has no probability mass at 0). Consider 
the random variable Xn which counts customers immediately after each service 
time ends. {Xn}n>o is a Markov chain on integers with transition matrix 



ao 





ai 

ao 




a2 

ai 
ao 



as 

0-2 

ai 



V 



\ 



(3.1^ 



wher e for each j > 0, aj =^ Io° ^'^{^ty / j^-} dB(t) (see ( Mevn and Tweedie . 



19931 proposition 3.3.2)). It is known that P is irreducible, aperiodic, and positive 



def 



recurrent if p =' Ami = Yl'jLiJOj < 1, where for n > 0, m„ =^ J t^dB{t). 
Applying the results derived above, we will compute explicit bounds (depending 
on A, X and the moments of the service time distribution) for the convergence 
bound ||P"(x, ■) ~ ^11/ some appropriately defined function /. 

Because the chain is irreducible and positive recurrent, tq < oo P^-a.s. for 
x G N. By construction, for all x = 1, 2, . . . , Tx-i < Tq, P^.-a.s., which implies 
that E^jfro] = E^-fr^^^i] + E^_i[ro] and, for any s G C such that |s| < 1, Ej;[s'^''] = 
Ex[s'^"'"^]Ea;„i[s'^"], where t^^i is the first return time of the state x — 1. For all 
X = 1,2,..., we have Fx{tx-i G ■} = Pi{to G ■} which shows that E^.[ro] = 
xEi[ro] and Ex.[s'^°] = e^'(s), where e{s) '= Ei[s'^'']. This relation implies 



eis 



,A(e(s)-l)t 



dB{t) . 







sao + ayt 

y=l 

By differentiating the previous relation with respect to s and taking the limit as 
s — s> 1, the previous relation implies that: Ei[ro] = (1 — p)~^. Since {0, 1} is an 
atom, we may use Theorem 12.41 with C = {0, 1}, r = 1 and vq = 1. In this case 

Uo{x) = Vo{x) = 1 + E4ac] = 1 + E,_i[ro]l{,>2} = 1 + (1 - p^^x - 1)1{.>2} • 
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Theorem 12.11 shows that, for any [x,x') G N x N and any functions a and j3 
satisfying 

«H||P"(x, ■) - P"(x', OIU < 1 + (1 - p)-\x V x' - l)l{..v.'>2} . 

Convergence bounds a(n)||P"(x, ■) — 7r||^ can be obtained by integrating the 
previous relation in x' with respect to the stationary distribution vr (which can 
be computed using the PoUaczek-Khinchine formula). 

It is possible to choose the set C in a different way, leading to different bounds. 
One may set for example C = {0, ...,xo}, for some xq > 2. For simplicity, 
assume that the sequence {a,j}j>o is non-decreasing. In this case, for all x G C 
and y G N, P{x,y) = ay_,j:^il{y>x-i} > ayl^y^^^^iy and the set C satisfies [(Al)| 

with e = J2'^=xo-i^y ^^"^ ^(^) = ^~^(^y'^{y>^o-i}- Taking again r{k) = 1 and 
Vo{x) = 1, we have 

Uoix) =Vo{x) = l + E,[rc]lc=(x) = 1 + E,[r,.Jlcc(x) 

= 1 + E,_,Jro]lc^(x) = ! + (!- p)-\x - Xo)lc-(x) . 

To apply the results of Theorems 12.41 we finally compute a bound for = 
supqQUq = (1 — e)^^[sup(^ Pf/o — ^i^iUo)], which can be obtained by combin- 
ing a bound for snp q PUq and the expression of v{Uq). An expression v{Uq) is 
computed by a direct application of the definitions. The bound for sup^ PUq is 
obtained by noting that, for all y > xq and x G C, P{x,y) < P{xo,y) = ay-x^+i, 
which implies 

PUo{x) = E^rc] = 1 + [ExArc]l{rc>i}] = 1 + [ExAr.o]Mx,m] 

oo oo 

= l + (l-p)-i J2 iy-xo)P{x,y) <1 + {1- p)-' J2 (y - xo)ay-,,+i . 

y=xo+l y=X()+l 

We provide some numerical illustrations of the bounds described abo ve. We use 



the distribution of service time suggested by in iRoughan et al.l fll998[ ) given by 



aP-ie^§^ x<B 
h{x) = l - (3.2) 

laP"e-"x-"+^ x>B 

where B marks where the tail begins. The mean of the service distribution is 

mi = P {1 + e~"/(a — 1)} /a and its Laplace transform, G{s) = e~^*dB{t), 
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s G C, Re(s) > 0, is given by 

1 _ p-{sB+a) 

G(s) = a + aS"Re-°s"r(-a, sB) , 

sB + a 

where r{x,z) is the incomplete F function. The probabihty generating function 
P-k{z) of the stationary distribution is given by the PoUaczek-Khinchine formula 

^ {I - p)[z - 1)G{\{1 - z)) 
^ ' z-G{\{l-z)) 

In figures^andEl we display the convergence bound ||P"(x, ■)— ttHtv as a function 
of the iteration index n, for x = 10, a = 2.5, different choices of the small set 
upper limit xq = 1, 3, 6, and two different values of the traffic p = 0.5 (light traffic) 
and p = 0.9 (heavy traffic). Perhaps surprisingly, the bound computed using the 
atom C = {0, 1} is not better uniformly in the iteration index n. There is a trade 
off between the number of visits to the small set where coupling might and the 
probability that coupling is successful. In the heavy traffic case (p = 0.9), the 
queue is not very often empty, so the atom is not frequently visited, explaining 
why deriving the convergence bound from a larger coupling set improves the 
bound (this effect is even more noticeable for a critically loaded system). 

Insert figures ^ and |21 approximately here 



3.2. T he Independence Sampler. This second example is borrowed from lJarner and Roberts 
'.t is an example of a Markov chain which is stochastically monotone w.r.t 



a non-standard ordering of the state and does not have an atom at the bottom 
of the state-space. 

The purpose of the Metropolis-Hastings Independence Sampler is to sample 
from a probability density vr (with respect to some cr-finite measure p on X), 
which is known only up to a scale factor. At each iteration, a move is proposed 
according to a distribution with density q with respect to p. The move is accepted 
with probability a{x, y) *== f||y^|^ A 1. The transition kernel of the algorithm is 
thus given by 

P{x,A)= / a{x,y)q{y)p{dy) + lA{x) / (^l-a{x,y)^q{y) p{dy), xeX,AeX. 

It is well known that the independence sampler is stochastically monotone with 
respect to the ordering: x' < x ;^||y < Without loss of generality, it is 
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assumed that 7r{x) > for all a; G X and that g > vr-a.s.. For all rj > 0, define 
the set 



For any rj > 0, we assume that < vr(C,,) < 1 and we denote by ly-qi-) the 
probability measure z/^(-) = 7r(- fl C^)/7r(C^). For any x G C^, 



q{x) 



P{x,A)> I f ^ A ^) 7r(y)Mrfy) 



> 



showing that the set Cr, satisfies [( A 1 ) | with u = and e = ri7r{Cr,). 

Proposition 3.1. Assume that there exists a decreasing differentiate function 
K : (0, oo) (1, oo), whose inverse is denoted by , satisfying 

(1) the function (j){v) = vK~^{v) is differentiable, increasing and concave on 
[1, oo), hm^^oo 0(^0 = oo, and lim^^oo 4>'{v) = 0. 

(2) J^°° uK{u)dilj{u) < oo, where for rj > 0, ip{ri) =^ 1 — 7r(C^). 

Then, for any rj* satisfying 



Jo 

assumption \(B4)\ is satisfied with Wq = K o {q/7r), C = Cn* and 

POO 

Jo 

In addition, 

r+oo 

sup PWo < / uK{u)dtlj{u) + K{r]*) . 

x(^C„* Jn 



To illustrate our results, we evaluate the convergence bounds in the case where 
the target density vr is the uniform distribution on [0, 1] and the proposal density 
is q{x) = (r + l)x^l[o,i](x). Proposition 13.11 provides a mean to derive a drift 
condition of the form PWo < Wq — o Wq outside some small set C for functions 
(j) E C oi the form = cv^~^^" + d for any a G [1, 1 + 1/r). In this case, the 
function tjj is given by ^{rj) = {r]/{r + 1))^^*", for r] G [0,r + 1] and ^{r]) = 1 
otherwise. We set, for u G [0,r + 1], K{u) = {u/{r + 1))"". The integral 
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J uK{u)dip{u) = ^(^!f^y^^-^^ is finite provided that a < 1 + 1/r. The function 
0(n) = uK~^{u) = n^^^/°(r + 1) belongs to C provided that a > 1. 

Using these results, it is now straightforward to evaluate the constants in The- 
orem |23 this can be employed to calculate a bound on exactly how many iter- 
ations are necessary to get within a prespecified total variation distance of the 
target distribution. In figures El and El we have displayed the total variation 
bounds to convergence for the instrumental densities q{x) = 3x^ (r = 2) and 
q{x) = {3/2)^/x. We have taken a = 1.1 and r]* = 0.25 for r = 2 and a = 1.5 and 
rj* = 0.5 for r = 1/2. When (r = 2, a = 1.1) the convergence to stationarity is 
quite slow, which is not surprising since the instrumental density does not match 
well the target density at zero x = 0: according to our computable bounds, 500 
iterations are required to get the total variation to the stationary distribution 
below 0.1. When r = 1/2, the degeneracy of the instrumental density at zero is 
milder and the convergence rate is significantly faster. Less than 50 iterations 
are required to reach the same bound. 

4. Proof of Theorem 12.11 

The proof is based on the pathwise coupling construction. For (x, x') G X x X, 
and A & X (E) X , define P the coupling kernel as follows 

P (x, x', 0;Ax {0}) = (1 - elc^cix, x')) P(x, x'. A) 

P (x, x', 0; A X {1}) = elcxc{x, x')v{A n {(x, x') G X x X, x = x'}) 

P(x,x',l; A X {0}) = 

P(x,x',l;Ax{l}) = y P[x,dy)\A[v.v) ■ 

For any probability measure (x, x') G X x X, denote ^x,x' and ^x,x' the probability 
measure and the expectation on associated to the Markov chain {(X^, X^, c?r!.)}n>o 
with transition kernel P starting from (Xo,Xq,0) = (x, x',0). In words, the 
coupling construction proceeds as follows. \i dn = ^ and [X^.X'^ ^ C x 
we draw according to P(x, x', ■) and set dn+i = 0. If (in = and 

(X„,X^) G C X C, we draw a coin with probability of heads e. If the coin 
comes up head, then we draw X„+i from u and set X'^j^^ = Xn+i and dn+i = 1 
(the coupling is said to be successful); if the coin comes up tails, then we draw 
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X^+i) from P{Xn, X'^, ■) and we set dn+i = 1. Finally, if dn = 1, we draw 
Xn+i from P{Xn, ■) and set X^+i = ^^+1- 

By construction, for any n, {x, x') E X x X and {A, A') E X x X, 

Px,x',o(^n e A X X X {0, 1}) = P.,..',o(^n & A) = P"(x, A) and 

P.,x',o(^n G X X A' X {0, 1}) = P.,x',0« e A') = P"(x', A') . 



By f)Douc et all l2004bl Lemma 1), we may relate the expectations of functionals 
under the two probability measures f'x,x',o and Pa;,x'; where P^.x' is defined in 
fl2.1|) : for any non-negative adapted process (Xfc)fc>o and (x, x') G X x X, 

E.,.',o[Xnl{T>„}] = E,,.' [Xn (1 - e)"^"-^] , (4.1) 
where Nn is the number of visit to the set C x C before time n, 

CO n 

N^ = Y^ 1k<"} = E icxc{X,,X:) . (4.2) 

Let / : X [0, oo) and let : X ^ M be any Borel function such that 
sup^^ y Iqfx)!/ f(x) < oo. The classical coupling inequality (see e.g. (jThorisson . 



200Ct Chapter 2, section 3)) implies that 



|P"(x,^?) - P"(x', (7)1 = |I.,..,o [9{Xn) - 9{X'J]\ 

< sup \g{x)\/f{x) I.,,, [(/(^n) + fiX'J)l{dn = 0}] , 

and ()4.H) shows the following key coupling inequality: 

||P"(x,-)-PV,-)||;< tx,AifiXn) + f{X'J){l-ef"-^} . (4.3) 

Because by definition a{u)l3{v) < pu+{l — p)v for all (n, v) G x ]R+ and any 
non negative function / satisfying f{x) + f{x') < PoV{x, x') for all (x, x') G XxX, 
the coupling inequality ()4.3|) shows that 

ao {R{n) + Mu}\\P''{x,-) - P'^ix' r)\\f 

< a o {P(n) + Mu} t^AifiXn) + /(X)}(1 - e)^"-] 

< p{P(n) + M^} E,,,,[(l - e)^"-] + (1 - p) E,,,,[\/(X„, X;)(l - e)^"-] . 
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Set for any n > 0, Un{x,x') = K^^x' [Sfc£o^ ''^('^ + ^)] • It is well known that 
{Un}n>o satisfies the sequence of drift equations 

PUn+i <Un- r{n) + bur{n)lc^c , (4.4) 

Similarly, PV <V — v + bylcxc- Define for n > 0, 

n-l 

dcf 



k=0 
n-l 



dcf 

fc=0 

with the convention = when u > v. 

Since by construction, for any n > 1, wii'' > R{n) and Wn^^ > the 
previous inequality implies, 

aoi?(n)||P"(x,-)-P"(x',-)ll/ 
We now have to compute bounds for E* ^, [^1/^*^(1 — e)^""^], z = 0, 1. Define 

(4.5) 

If e = 1, (1 - e)^"-i = l{^o>n}, where ctq = inf{n > | (X„,X;) e C x C} is the 
first hitting time of the set C x C: T„ l{o-o>n} = l{o-o>n} < 1. Consider now the 
case e < 1. By construction, for Nn-i = 0, T^*"* = 1 and for A'n-i > 0, 

where cxj are the successive hitting time of the set C x C recursively defined by 
cr^+i = inf{n > aj \ {Xn,Xl^) e C x C}. Because 1^^°^ > R{n + 1) + Mu, and 
1 + bur{n)/{R{n + 1) + Mu} <!/(!- e), for A^„_i > 0, we have 
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Similarly, because w!,^^ > My and 1 + by/Mv < 1/(1 - e), we have Ti^\l 
g~)Af,i-i < i_ These two relations imply, for i = 0, 1, 



It remains now to compute a bound for E*^, Wn^Tn''} ^ 
we have for n > 1 



. By construction. 



' ' n—1 



W!i\ + bur{n - l)lcxc(^n»i,X;_i: 



where ^„ = a {(Xq, X^), . . . , (X„, X'J}. Now, (jOl) yield: 

E.,x' [Wi^^^ I ^n-i] < Wi% + 6c;r(n - l)lcxc(X„_i, . (4.9) 

Combining ()4.8j) and ()4.9j) shows that < Wn^\Tn^}^^ \ is a jF-supermartingale. 

I ) n>0 



Thus, 



u ■ 



Similarly, E* .j,, Wn\l - e)^"-^ < V{x, x') + My, which concludes the proof of 
Theorem 12.11 

5. Proof of Proposition 12.21 Theorem 12.31 



Proof of P rovosition\2.JA By a pplying the comparison Theorem (jMevn and Tweedie 
19931 ) and (|Douc et al.L l2004aL Proposition 2.2), we obtain the following inequal- 
ities. Then, for all (x, x') G X x X, 



Ex^x' 



Exx' 



TCxC- 



<l>oWiX,,X', 



H~^(l) 

< W{x, x')-l + h \ , lcxc(a:, x') , (5.1) 



fc=0 



< W{x,x') + hlc^c{x,x') . 



(5.2) 



The sequence {</> o H^^(k)}k>o is log-concave. Therefore, for any k > 0, cj) o 
H7\k + l)/0 o H7\k) < o H7\l)/(t> o H7\Q). Then, applying (ED), we 
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obtain: 



^CxC 



. A;=0 



k=l 



1 



(f)oH7\l) . 



TCxC 



k=l 



1 



{CxCY{X,X 



showing ()2.16p . Similarly, 



J2 <poW{Xk,X', 

'CxC-1 



. fc=0 

showing ()2.17j) . 



(j)oW{x,x')lcxc{x,x') 



l(cxc)=(a;, x')+E^,x'[0ol^(X,, X'^)]l(^cxcAx, x') 



□ 



Proo/ o/ Theorem\KM Since rfo = inf^^c Wo{x), if (x, x') ^ C x C, W{x, x') > do 
and lc{x) + lc{x') < 1 since either x ^ C, x' ^ C (or both). The definition of 
the kernel P therefore implies 

PW{x, x') < Wo{x) + Wo{x') - 1 - 00 o Wo{x') - 00 o Wo{x') + bo {lc{x) + lc{x')} 
< W{x, x') - 00 o W{x, x') + bo , 

where we have used the inequality: for any m > 1 and v > 1, (j)o{u+v—l)—(f)o{u) < 
Mv) - 0o(l)- For ^ C, 6o < (1 - A)0o(rf) < (1 - A)0o o Woix,x') and the 

previous inequality implies PW{x, x') < W{x, x') — o W{x, x'). □ 



Appendix A. Proof of Proposition IIS. II 

Let W be any measurable non negative function on X. Then, for 77 > and 

X ^ C^, 



PW{x)-W{x)= / a{x,y){W{y) -W{x)}q{y)y^{dy) 
Jx 



W{y)Tr{y)n{dy) -W{x) / a{x,y)q{y)ii{dy). 

'x 
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U X ^ Crj and y G C^i, then y ^ x and a{x,y)q{y) = {q{x)/n{x)) n{y). Thus, we 
have: 

a{x, y)q{y)Kdy) > / a(x, y)q{y)fi{dy) = — -7r(C ) = ^^(1 - 

Altogether, we obtain, for all x ^ C^: 

PW{x)-W{x)< I L^^\w{y)7,{y)^{dy)-{l-iJm^W{x). 

(A.l) 

Applying the definition of Wo, we now have: 
rjAM\woiyMy)f^idy) 

r, A 441 A' f 441 ^iy)f^idy) = H {r] A < oo. (A.2) 



By Lebesgue's bounded convergence theorem, lim^^o Jo^iv ^ u)K{u)dip{u) = 0. 
Since moreover lim^^oV'('7) = 0; hence, for rj small enough, {1 — ip{ri)}(l){M) > 
Io°{v ^ u)K{u)dip{u), hence rj* is well defined. Now, ()A.1|) and ()A.2|) yield, for 
all X ^ C rq* , 

I'OO 

PWq{x) - Wq{x) < {ri* A u)K{u)dtp{u) - (1 - ip{ri^))Wo{x)K-^ o Wo{x) 
Jo 

= -MWoix)). 

For X e C^*, we have ^0(2^) < K{ri*). Finally, we have, for any x e C^*, 

PWoix) < [ q{y)Wo{y)fi{dy) + Wo{x) 
Jx 

q{y) ^ f441 <y)f^idy) + 1^0(3;) < uK{u)dij{u) + K{r,*). 
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Figure 1. convergence bound for the total variation distance in 
the hght-traffic case: p = 0.5, a = 2.5 
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Figure 2. convergence bound for the total variation distance in 
the heavy traffic case: p = 0.9, a = 2.5 
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Figure 3. convergence bound for the total variation distance for 
the independence sampler with q{x) = 
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Figure 4. convergence bound for the total variation distance 
when q{x) = 1.5^/x 
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